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We have analysed the organization of the 3’ end of the 
genomic RNA of canine coronavirus (CCV), a virus 
which has a close antigenic relationship to transmiss- 
ible gastroenteritis virus (TGEV), porcine respiratory 
coronavirus (PRCV) and feline infectious peritonitis 
virus (FIPV). Genomic RNA isolated from CCV strain 
Insave-l-infected A72 cells was used to generate a 
cDNA library. Overlapping clones, spanning approxi- 
mately 9-6 kb [from the 3’ end of the polymerase gene, 
Ib, to the poly(A) tail] were identified. Sequencing and 
subsequent analyses revealed 10 open reading frames 
(ORFs). Three of these code for the major coronavirus 


Introduction 


Canine coronavirus (CCV), a causative agent of enteritis 
in neonatal dogs, was first identified in 1971 (Binn et al., 
1974). The disease is characterized by infection of the 
absorptive epithelium of the villi and the onset of 
diarrhoea followed by villus atrophy (Keenan et al., 
1976). CCV belongs to the Coronaviridae, a family of 
enveloped viruses possessing a ssRNA genome of 
positive polarity. In infected cells, a set of 3’-coterminal 
subgenomic RNAs are produced and, as a result, the 5’ 
end of each mRNA contains unique sequence informa- 
tion not present on smaller RNAs in the nested set. Only 
this unique region of each MRNA is translated (reviewed 
by Spaan et al., 1988), thus the mRNAs are, in principle, 
functionally monocistronic. Nevertheless, some mRNAs 
contain two or more coding regions within the unique 
sequence and thus may be functionally bi- or tricistronic 
(Brierley et al., 1987; Liu et al., 1991; Liu & Inglis, 1992). 
The CCV virion is known to contain at least four protein 
species: the 204K spike glycoprotein, S; the 32K 
membrane glycoprotein, M; the 9-2K small membrane 
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structural polypeptides S, M and N; a fourth codes for a 
small membrane protein, SM, a putative homologue of 
the IBV structural polypeptide 3c, and five code for 
polypeptides, designated 1b, 3a, 4, 7a and 7b, homolo- 
gous to putative non-structural polypeptides encoded 
in the TGEV or FIPV genomes. An extra ORF which 
had not hitherto been identified in this antigenic group 
of coronaviruses was designated 3x. Pairwise align- 
ment of these ORFs with their counterparts in TGEV, 
PRCV and FIPV revealed high levels of identity and 
highlighted the close relationship between the 
members of this group of viruses. 


protein, SM; and the 50K nucleocapsid protein, N 
(Garwes & Reynolds, 1981; Godet et al., 1992). 

CCV belongs to one of the major antigenic groups of 
coronaviruses (Siddell et a/., 1983; Spaan et al., 1988) and 
is serologically related to feline infectious peritonitis 
virus (FIPV), feline enteric coronavirus (FECV), trans- 
missible gastroenteritis virus (TGEV) and porcine 
respiratory coronavirus (PRCV) (Sanchez et al., 1990). 
These viruses have been distinguished mainly by their ' 
host species of origin. It has been reported, however, that 
some strains of CCV can also infect cats (Barlough et ai., 
1984; Stoddart et a/., 1988) and swine without causing 
any apparent disease (Woods & Wesley, 1986). Likewise, 
TGEV can also infect other species (Woods & Pedersen 
1979; Norman et al., 1970) and FIPV can infect swine 
(Woods et al., 1981). This close relationship indicates 
that the viruses may have a common ancestor (Horzinek 
et al., 1982; Sanchez et al., 1990). 

Molecular analysis has helped to elucidate some of the 
aspects of this phylogenetic relationship and some of the 
mechanisms involved in pathogenesis. TGEV, PRCV 
and FIPV have been characterized in some detail and the 
genes encoding the structural proteins have been cloned 
and sequenced (de Groot et al., 1987; Vennema eft al., 
1991; Britton et al., 1988a, b; Rasschaert & Laude, 1987; 
Rasschaert et al., 1990). A comparison of the available 
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FIPV amino acid sequences with the corresponding 
sequences of TGEV and PRCV has revealed that the 
structural genes are very closely related. For S the 
identities were 81-6% (TGEV) and 76% (PRCV), for M 
84-4% and 85:9%, and for N 77% and 75-6%, respec- 
tively. This contrasts greatly with the relationship to 
murine hepatitis virus (MHV), a prototypic coronavirus 
from another antigenic group, where the identities for 
these polypeptides are 24%, 30% and 27%, respectively 
(Schmidt et a/., 1987; Skinner & Siddell, 1983; Arm- 
strong et al., 1984). Despite this high degree of similarity 
amongst the structural proteins of these three viruses 
there are, nevertheless, differences at the 3’ end of their 
viral genomes and in their subgenomic message 
organisation. 

CCV is the least characterized virus from this 
antigenic group. Here we report the cloning and 
sequencing of 9-6 kb from the 3’ end of the RNA of the 
avirulent CCV strain Insave-1, subgenomic message 
analysis and comparison to available TGEV, PRCV and 
FIPV sequence data which illuminate the evolutionary 
relationship of this family of viruses. The presented 
sequence, which includes all of the CCV coding 
information except for the polymerase region, represents 
the first report of cloning and sequencing of a canine 
coronavirus. 


Methods 


Virus and cells. Canine A72 cells and CCV strain Insavc-1 were 
obtained from Dr W. Baxendale (Intervet UK, Houghton, U.K.). A72 
cells were grown in Gibco’s Wellcome formula, a modified Eagle’s 
medium supplemented with 10% foetal calf serum (FCS) containing 
penicillin (100 units/ml) and streptomycin (100 g/ml) (MEM). Flasks 
(175 cm?) of A72 cells were washed with PBS and infected with CCV at 
an m.o.i. of 0-1 in 10 ml MEM. Virus adsorption was allowed to proceed 
for 60 min at 37 °C and the inoculum was then replaced by MEM-10% 
FCS. 


Preparation of CCV genomic and messenger RNAs. CCV genomic 
RNA was prepared as follows. At 48 h post-infection (p.i.) the culture 
supernatant was harvested, chilled to 4 °C and the cell debris removed 
by low-speed centrifugation (3000 g for 15 min). Virus was pelleted 
from the supernatant at 53000 g for 2h (Beckman type 19 rotor) and the 
pellet homogenized in 6 M-guanidinium isothiocyanate, 0-5% N-lauroyl 
sarcosinate, 5 mM-sodium citrate. The mixture was layered onto a 5-7 M- 
CsCl pad and viral RNA pelleted by centrifugation (108000 g for 12h 
at 18 °C). The RNA was dissolved in 10 mm-Tris—HCl, 0-1 mmM-EDTA 
(TE) containing 0-1% SDS and stored at —70°C. Samples were 
analysed on a 1% Tris-borate-EDTA agarose gel containing 0-1% 
SDS. A single species of high M, RNA was identified with the 
characteristic mobility of coronavirus genomic RNA. 

Subgenomic RNAs were prepared in a similar manner. Briefly, at 
36 h p.i. the infected cells were chilled to 4 °C, washed three times with 
ice cold PBS then pelleted at 3000 g for 10 min. The cell pellet was 
homogenized in 6 M-guanidinium isothiocyanate, 0-5% N-lauroyl 
sarcosinate, 5 mm-sodium citrate then treated as described above. 


Cloning of CCV genomic RNA 

(i) cDNA cloning. A cDNA library from CCV genomic RNA was 
prepared by reverse transcription after priming with oligo(dT) and 
random pentanucleotides using the instructions and contents of the 
Boehringer Mannheim Biochemica cDNA synthesis kit. The resulting 
cDNA was blunt-ended using T4 DNA polymerase and ligated into the 
Smal site of pUCI19. Portions of the ligation mixture were 
transformed into Escherichia coli strain TG-1 and clones were identified 
by colour selection. Inserts of viral origin were confirmed by colony 
hybridization using cDNA prepared by random priming of CCV RNA 
as a probe. CCV-derived recombinant clones were analysed by 
restriction enzyme digestion and those containing inserts of 1-8 kb or 
greater in size were retained for further study. 

(ii) Polymerase chain reaction (PCR). PCR-amplified fragments were 
obtained using cDNA:RNA heteroduplexes as template and oligo- 
nucleotides 7 and 8 (each of which contains a NotI site; Fig. 1) as 
primers. Tag DNA polymerase (Promega) was used to amplify the 
region of interest according to the recommendations of Sambrook et al. 
(1989) and 25 cycles (95 °C, 1 min; 60 °C, 1 min; and 72 °C, 2 min) were 
performed in a Techne PHC-1 machine. The generated DNA fragment 
was cleaved with Noi], gel-purified, ligated into the NotI site of pKL1 
and transformed into E. coli strain TG-1. (pK L1 is a pUC-based vector 
with a modified polylinker and was a gift from Dr K. Law, University 
of Cambridge, U.K.) 

Sequencing 

(i) M13 DNA sequencing. DNA sequencing was performed by 
Sanger’s dideoxynucleotide chain termination method as described by 
Bankier et al. (1987). Briefly, insert DNA was excised from vector 
sequences, self-ligated and sonicated in a cup-horn sonicator (Heat 
Systems, Ultrasonics). The sonicated DNA fragments were end- 
repaired with the Klenow fragment of E. coli DNA polymerase I and 
T4 DNA polymerase prior to size selection on a 1-:2% agarose gel. 
Fragments in the size range 300 to 500 bp were purified and cloned into 
Smal-digested, phosphatase-treated M13mp8. Shotgun sequence data 
were assembled using the SAP programs of Staden (1982) on a VAX 
8350 and microVAX 3100 (Digital Equipment Corporation). 

(ii) Supercoiled DNA sequencing. DNA templates were prepared as 
described by Lim & Péne (1988). CsCl-purified plasmid DNA (3 pg) 
was denatured with 0:15 M-NaOH and 0-15mM-EDTA for 30 min at 
37°C, then centrifuged through a Sepharose CL6-B column equili- 
brated in TE. Sequencing reactions were carried out on the eluate as 
described using the pUC forward and reverse primers. 

(iii) RNA sequencing. Primer (50 pmol) was annealed to either 1 ug 
genomic RNA or 10 pg total infected cell RNA at room temperature for 
15 min. Sequencing reactions were performed as described by Fichot & 
Girad (1990). 

Northern blot hybridization. Total RNA extracted from CCV-infected 
cells was denatured for 15 min at 56 °C in 50% deionized formamide, 
2:2 M-formaldehyde and 0-5 mmM-EDTA. The samples were cooled on 
ice after the addition of loading buffer containing 0-5% SDS, 0-025% 
bromophenol blue and 25% glycerol. The samples were electrophor- 
esed overnight in a horizontal submerged gel containing 1-1 M- 
formaldehyde and 0-8% agarose. RNA was blotted from the gel to a 
nitrocellulose filter (Schleicher and Schuell). Prehybridization was 
carried out in 5 x SSC (1 x SSC is 150m mm-sodium chloride and 15 
mM-sodium citrate), 10 x Denhardt’s solution (1 x Denhardt’s solution 
is 0-02% polyvinylpyrrolidone, 0-02% Ficoll and 0-02% bovine serum 
albumin), 100 pg/ml sonicated salmon sperm DNA and 0-1% SDS for 
2 h at 65°C. Hybridization was carried out at 65 °C overnight after 
addition of a 32P-radiolabelled DNA probe prepared by random 
priming the CCV-specific insert purified from pBH5 (Sambrook et al., 
1989). Following hybridization, the filter was washed twice at 65°C 
with 2 x SSC, then washed three times at 42 °C with 0-2 x SSC, prior to 
exposure to X-ray film. 


Plasmid Approximate size (kb) 


pBHS 18 
pBH6 17 
pBH7 26 
pBH8 20 


pBH9 3.0 


TGEV 


pBH9 p8Hs 


Sequence of 3’ end of canine coronavirus RNA 2851 


Not! 


oligo 7 5 GTT GCA ATT GCG GCC GCIA CAG TTA TTA TTG TTC 


Not! 


oligo 8 5'CCC ATT GGC AAC GCG GCC GCI GTC ACC AAA ATT GGC 


PCR clone 
cDNA clones 


pBH7 sue 7 pBH5 


Fig. 1. Alignment of CCV cDNA clones with respect to the TGEV genome using partial sequence information. Oligonucleotides 7 and 
8 were used as primers in a PCR reaction to obtain clone pBH6. Overlaps were confirmed by Southern blotting. 


Southern blotting and other cloning procedures. These were carried out 
according to the protocols of Sambrook ef a/. (1989). Enzymes were 
used according to the manufacturers’ specifications (Boehringer 
Mannheim and New England Biolabs). 


Results 


Generation and mapping of CCV clones 


To clone the 3’ end of the CCV genome we prepared a 
cDNA library from CCV genomic RNA. Inserts from 
recombinant clones of 1-8 kb or greater were selected for 
further analyses. In order to map the clones, we took 
advantage of the suspected nucleotide sequence homol- 
ogy between the genomes of CCV and TGEV. Partial 
sequencing of recombinant clones revealed identity in 
excess of 95%. This permitted initial alignment of the 
CCV clones with respect to the TGEV genome. This 
approach proved fruitful in that four clones were 
identified which spanned some 8-5 kb at the 3’ end (Fig. 
1). A region at the 3’ end for which large clones were not 
represented in the library was prepared by PCR 
amplification (BH6; Fig. 1). The relationships between 
putative overlapping clones were confirmed by Southern 
hybridization. Therefore, partial sequencing and South- 
ern blotting identified five overlapping clones which 
covered approximately 9-6 kb from the 3’ end of the CCV 
genome. 


Shotgun DNA sequencing and sequence analyses 


The inserts from the plasmids detailed in Fig. 1 were 
sequenced using the shotgun methods of Bankier et al. 
(1987). The consensus nucleotide sequence of 9624 bp 
presented in Fig. 2 was analysed using the SAP programs 
of Staden (1982). Analysis revealed the presence of 10 
open reading frames (ORFs) (Fig. 3). Pairwise alignment 
of these ORFs with their likely counterparts from other 


members of this coronavirus group disclosed very high 
levels of identity (Table 1) and indicated that the CCV 
structural proteins S, M and N are encoded by ORFs 2, 4 
and 5, respectively. Each of the 10 ORFs is described in 
more detail below. 

With respect to subgenomic mRNA synthesis, it is 
known that the minimal conserved signal for transcrip- 
tion in this coronavirus group, CTAAAC, is identical in 
TGEV, PRCV and FIPV and is therefore likely to be 
conserved in CCV (as reviewed by Spaan et al., 1988). 
Indeed, analysis of the CCV sequence revealed that this 
sequence was present upstream of all the ORFs with the 
exception of the first and last. As ORF 1 is incomplete 
(see below), an additional CTAAAC sequence is 
presumably located at the 5’ end of the genomic RNA. 
When we analysed intracellular RNAs produced during 
CCV infection of canine A72 cells, eight species of RNA 
were observed (Fig. 4); the species observed between 
species 5 and 6 could not be accounted for in terms of the 
3’-coterminal nested arrangement of coronavirus subgen- 
omic RNAs and the observed positions of consensus 
transcription initiation signals. Taking into account 
the predicted size of each mRNA and the known 
location of the CTAAAC sequences, we predict a 
subgenomic message organization as depicted in Fig. 3. 
The ORFs encoded by each mR NA are described below. 

The numbering of CCV RNAs used here is based on 
that currently employed by workers studying the Purdue- 
115 and FS772/70 strains of TGEV. The RNA organiza- 
tion of CCV strain Insavc-1 is most closely related to that 
described for these TGEV strains. This numbering 
scheme is not, however, applicable in a straightforward 
fashion to all members of the antigenic group. In the case 
of the Miller strain of TGEV, an RNA originally 
designated 4b (Wesley et a/., 1989) may be involved in 
the expression of ORF 3b and ORF 4; no additional 
RNA was detected between this RNA and the RNA 
coding for the membrane protein (RNA 5), but the 
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Ib 


PNT K S IDGeENTS KOGFFTYVNGFIKEKLUS LGGSAATI KITE 
TCCCCAACACAAAGTCAATTGACGGTGAAAACACGTCAAAAGATGGTTTCTTTACCTATGTTAATGGTTTTATTAAAGAGAAACTATCGCTTGGTGGATCTGCCGCCATCAAAATCACTG 


Fs WNKODLYELIQRFEYwWT?TYVYF CTS VNT S$ SSEGFLIGVYN Y¥ LG 
AATTTAGTTGGAATAAAGATTTATATGAATTGATTCAAAGATTTGAGTATTGGACTGTGITTTGTACAAGTGTTAATACCTCTTCATCAGAAGGATT TCTGATTGGTGTTAACTACTTAG 


Py ¢oDRAIVODGNtIM4H AN Y IF WR RNS TIM AL S HN S V LD Tf P K F K 
GACCATACTGTGACAGGGCTATTGTAGACGGAAA TATAATGCATGCCAATTATATATT TTGGAGAAAT TCTACAAT TATGGCTCTATCACATAACTCAGTCCTAGACACTCCCAAGTTCA 


c RC NNALIVNLK EK EL NEMVYVY I GkLLK KG K OL IRNN GK LUN F 
AGTGTCGTTGTAATAACGCACTTATTGTTAATT TARAAGAAARAGAATTGAAT GAAATGGTCATTGGATTACTAAAGAAAGGTAAGTTGCTCAT TAGAAACAA TGGTAAACTACTABSCT 


Su tyiprLEcLFLELFLuYssySCTSNNDCVEVANVTOL 
G N HH LVN VP * 
TIGGTAATCACTTGGTTAATGTGCCATGATTGTGCTTACATTGTGCCTITTICTTGTTTTTGTACAGTAGTGTGAGCCTGTACATCAAACAATGACTGTGTACAAGT TAATGTGACACAACT 


P GN ENII«K ODF LEQN F K EEGSLVVGGY Y PT EV WY NCS TT EQ 
GCCTGGCAATGAAAATATTATCAAAGATTTT.CTATTTCAGAACT T TAAAGAAGAAGGAAGTTTAGT TGTTGGTGCT TATTACCCCACAGAGGTGTGGTATAACTGTTCCACAACTCAACA 


TT AY K Y F S N IH A F Y F DM E AM EN S TGNAR GK PLLV HV K GN P 
AACTACCGCTTATAAGTATTT TAGTAATATACATGCATTTTATTTTGATATGGAAGCCATGGAGAATAGTACTGGCAATGCACGTGGTAAACCT TTACTAGTACATGTTCATGGTAATCC 


APS. OE ON OXY SR SR. YY ORD: DOV, AQ: SR PRP Kr A GY ee eT oT SR INE DI SID NDS? Pe ot © E 
TGTTAGTATCATTGTTTACATATCAGCTTATAGAGATGATGTGCAATTTAGGCCGCTTTTAAAGCATGGTTTATTGTGTATAAC TARARATGACACCGTTGACTATAATAGCTTTACAAT 


NQWRoODIc¢cuLGoDpDRKIePF SVVPTDNGTK LE GLEWND DY VTA Y 
TAACCAATGGCGAGACATATGTTTGGGTGACGACAGAAAAATACCATTCTCTGTAGTACCCACAGATAATGGTACGAAATTATT TGGTCTTGAGTGGAATGATGACTATGTTACAGCCTA 


Is DES KH RUN INNNWFNNVTLLY S RTS TAT WQH S AA Y V ¥ QG 
TATTAGTGATGAGTCTCACCGTTTGAATATCAATAATAATTGGTTTAACAATGTTACACTCCTATACTCACGTACAAGCACCGCCACGTGGCAACACAGTGCTGCATATGTTTATCAAGG 


vs NFTY Y KLNKTAGULKSYELCEDYEYCTEGC YA TN V FAP TS G 
TGTTTCAAATTTTACTTATTACAAGT TAAATAAAACCGCTGGCT TAAAAAGCTATGAATTGIGTGAAGATTATGAATACTGCACTGGCTATGCAACCAATGTGTTTGCTCCGACATCAGG 


G Y 1PDG*FSs F NNWFOM OLTN S$ S T FV S GR FVTNQPLLVY NCL WP V 
TGGTTATATACCTGATGGATTCAGTTTTAACAATTGGTTTATGCTTACAAACAGCTCCACTTT IGT TAGTGGCAGATT TGTAACAAATCAACCGCTGCTAGTTAATTGCTTGTGGCCAGT 


P SF GVAAQEFCFEGAQFSQCNGVS LN N T V OV IR FN LN EF TT 


CCCCAGTTTIGGCGTCGCAGCACAAGAATTTTGT TT TGAAGGTGCTCAGT TTAGCCAATGTAACGGTGTTTCTTTAARATAATACAGTAGATGTTATTAGATTTAACCT TAATTTCACTAC 


dDV@Q@sGMGATVFSL_NTTGGVILEtTsc YN OTVS FS S FY SY GE 


AGATGTACAATCTGGCATGGGTGCTACAGTAT TT TCACTGAATACAACAGGCGGTGTCATTCTTGAGATT TCT TGTTATAATGACACAGTGAGTGAGTCGAGTTTCTACAGT TATGGTGA 


T PP GVT?ToOoOGPRYcYVLY NGTALK Y «4 GT LPP SY KE TAT S KWH G 


AATTCCATTCGGCGTAACTGATGGACCACGT TACTGTTATGTACTCTACAATGGCACAGCTCTTAAGTATT TAGGAACATTACCACCTAGTGTCAAGGAAAT TGCTATTAGIAAGTGGGG 


Rk F yY IN G Y¥Y N F F S T F PET DCTIA F NLT T GAS GA FW TT AY TS Y TE 


ACATTTTTATATTAATGGTTACAATTTCTTTAGCACGTTTCCTATTGATTGTATACCTTT TAATTTAACCACTGGTGCCTAGTGGAGCATTT TGGACAATTGCTTATACGTCGTACACAGA, 


AL VQV EN TA IK K VT Y CN S KRIN NIK CS QLUTANLQN GF Y PVA 
AGCATTAGTACAAGTTGAAAACACAGCTATTAAAAAGGTGACGTATTGTAACAGTCACAT TAA TAACA TCAAATGT TCTCAACT TACTGCTAATT TGCARAATGGTTTTTACCCTGTTGC 


SS EVGLVNKSVVLULPSF YS HT SVN,I TI DLGM KRS VTV TOA 
*{PCAAGTGAAGTTGGTCTTGTCAATAAGAGTGTTGTGTTACTACCTAGTTTCTATTCACATACCAGTGTTAATATAACTATTGATCTTGGTA TGAAGCGTAGTGT TACGGTCACCATAGC 


S PLSNITLPMQODNNIDVYC€CIRSNQFS V ¥ VRS TC K S S LW DN 
CTCACCATTAAGTAACATCACACTACCAATGCAGGATAATAACATAGACGTGTACTGTATICGTTCTAACCAATTCTICAGITTATGTTICATTCCACT TGCAAAAGTTCTTTATGGGATAA, 


120 


240 


360 


480 


600 


720 


840 


960 


1080 


1200 


1320 


1440 


1560 


1680 


1800 


1920 


2040 


2160 


2280 
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NFNSACTDVLDATAVIKTGtTePF S$ FDKLNNY ETF NK FCL S 
CAATTTTAATTCAGCATGTACCGACGTT T TAGACGCCACAGCTGTTIATAAAAACTGGTACTTGTCCTTTCTCATTTGATAAATTGAATAATTACT TAACTTT TAACAAGTTCTGITTGTC 


LNPVGANC KLODVAARTRTNEQVFGSLYVIY EEGDNIV GV P 
GTTGAATCCCGTTGGTGCCAACTGTAAGT TAGATGTTGCCOGCCCGTACAAGAACCAATGAGCAGGTTTTTGGAAGTTTATATGTAATATATGAAGAAGGAGACAACATAGTGGGTGTACC 


SpNSGLHODULSVLHUDS CTDYNTY GRTGVGHTIAIRK TN S TLL S 
GICTGATAATAGTGGTTTGCACGATTTGTCAGTGTT GCACTTAGACTCTTGTACAGATTACAATATATATGGTAGAACTGGTGTIGGTATTATTAGABAAACTAACAGCACACTACTTAG 


GLY YTS ts GDLULGF KNVSDGYVYVY S$ VT PCOKoOVS AQAA VY TDGA 
TGGCTTATATTACACATCACTATCAGGTGATT TGTTAGGTT TTAAAAATGTTAGTGATGGTGT TGTCTACTCTGTAACGCCATGTGA TGTAAGTGCACAAGCTGCTGTTATTGATGGTGCC 


IvGaAsM tT S$ INS ELLGULTHAWTTT PN F YY Y ¥ S TY NY TN VM NRG T 
CATAGTTGGAGCTAT GACT TCCATTAATAGTGAACTGTTAGGTCTAACTCATTGGACAACAACACCTAAT TTT TAT TACTACTCCATATATAATTATACAAATGTGATGAATCGTGGCAC 


A IDNDdI>DcCcE.P TITY SNIGYVYEOKNGALYVY FINVTHS DGD VQ PTI 
GGCAATTGATAATGATATTGATTGTGAACCTATCATAACATATTCTAATATAGGTGTTTGTAAAAATGGAGCT TIGGTTTTTATTAACGTCACACATTCTGATGGAGACGTTCAACCAAT 


sTGNVTIPTNFTISVQVEYIQVY T TPVS IDCAR Y VEN GN P 
TAGCACCGGTAATGTCACGATACCCACAAATTTTACTATATCTGTGCAAGTCGAATATATTCAGGTTTACACTACACCAGTTTCAATAGACTGTGCAAGATACGTTTGCAATGGTAACCC 


RecN KLLUTQY¥vsS AC QTIEQAL AM GARLENM ETDS ML FV S EN 
AAGATGCAATAAGTTATTAACACAATACGTTTCTGCATGTCAAACTATTGAGCAAGCGCTTGCAATGGGTGCCAGACTTGAAAACATGGAGATTGATTCCATGTTATTIGTTTCGGAAAA 


ALK LAS V EA FNS TENL DPI Y¥Y KEWPNIGGSWLGEGL KK DIL P Ss 
TGCCCTTAAATTGGCATCTGTTGAAGCATTCAATAGTACGGAAAATT TAGACCCTATTTATAAAGAAT GGCCTAACATTGGTGGTTCTTGGCTAGGAGGT TTAAAAGATATATTGCCATC 


HN S KR KY RS ATEODULUFODBKVVTSGLUGTtTVODOED Y¥ KRSAGEGE YY DI 
TCATAATAGCAAACGTAAGTACCGCTCGGCTATAGAAGACTTGCTTTTTGATAAGGTTGTAACATCTGGCT TAGGTACAGT TGACGAAGAT TACAAACGT TCTGCAGGTGGTTATGACAT 


ADULVcAR YYNGIMVLPGVANDODKM TM Y TAS LT GGITELGAL 
AGCTGACTTAGTGTGTGCACGATATTACAATGGCATCATGGTGCTACCT GGTGTAGCTAATGATGACAAGATGACT ATGTACACTGCATCTCTTACAGGTGGTATAACATTAGGTGCACT 


S§S G6GGAvaATIPFAVAVQARLNY VAL QTDVLEN KNQQI LANA FN Q 
TAGTGGTGGCGCAGTGGCTATACCTTITTGCAGTAGCAGTTCAGGCTAGACTTAATTATGTTGCTCTACAAACTGATGTATTGAACARARACCAACAAATCTTGGCTAATGCTTTCAATCA 


A IGNtiIT?¥sQaAFGkKVN ODA IAQ T S&S KGLATVY AK ALA KV QODV VN T Q 
AGCTATTGGTAACATTACACAGGCATTTGGTAAGGTTAATGACGCTATACAT CAAACATCAAAAGGTCTTGCTACTGTTGCTAAAGCATTGGCAAAGGTGCAAGATGTTGTTAACACGCA 


GQaAtLS HLTVQLQNNFQOATI SS STS DIY NRLDELS ADA QV DR 
AGGTCAAGCTT TAAGCCACCTAACAGTACAAT TGCAAAACAATT TTCAAGCCATTAGCAGTTCTATTAGTGACATT TATAACAGGCTTGATGAATTGAGTGCTGATGCACAAGTTGACAG 


LIgtTtTGgRoUoUgtv?TAL NAF YVY S QTL T R QA EV RAS RQLA K DOK VN E CV R S 
GCTGATTACAGGACGACT TACAGCACTTAATGCATT TGTGICTCAGACTT TAACCAGACAAGCAGAGGTTAGGGCTAGTAGACAACTTGCTAAAGACAAGGT TAATGAATGCGTTAGGTC 


Qs QR F GFcCGNGTHLF S LAN AA PN GMI F FHT V LLP TAY ET V 
TCAATCCCAGAGATTTGGATTCIGTGGTAATGGTACACATT TGTTTTCACTIGCAAATGCGGCACCAAATGGCATGATTTTCTTTCACACAGTGCTATTACCAACAGCTTATGAAACTGT 


TAWS GICA SDGSRTFGULVVEDVQLTLF RNUDE K F ¥ LT PRT 
GACGGCCTGGTCAGGTATTTGTGCGTCAGATGGCAGTCGCACTT TIGGACTTGTTGT TGAGGATGTCCAGCTGACGCTATTTCGCAATT TAGATGAAAAATTTTATTTGACGCCCAGAAC 


MYQPRVATS SDFVQtIEGCDVLFVNGTV I ELreY §& TTP DY IDI 
TATGTATCAGCCCAGAGTTGCAACTAGTTCTGATITTGTICAAATAGAAGGCTGIGATGTGITGITTGITAATGGAACTGTAAT TGAAT TGCCTAGTATCATACCTGACTATATCGATAT 


NoTvQqQqoDEI.LENF RPNWTV PEL PLDI F HAT Y LNLTGETNODLE 
TAATCAAACTGTTCAGGACATATTAGAAAATT TCAGACCAAATTGGACT GTACCCGAGTT GCCACTTGACATI TTT CATGCAACCTACT TAAACCTGACTGGTGAAATTAATGACTTAGA 


FRS EK LHNTTVELATULIDNINNTLVNLEWL N RIE T ¥ V K W P 
ATTTAGGTCAGAAAAGTTACATAACACCACAG TAGAACT IGCTATICTCATTGATAATATTAATAACACATTAGTCAA TCT TGAATGGCTCAACAGAATTGAAACTTATGTAAAATGGCC 


2400 


2520 
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WyvwWwtUtbicegupvys«vyiFctiriePILuFcCcccsTtT Gee GETEGCCLE Ss CCK S 
TIGGTATGTTTGGCTACTAATTGGATTAGTAGTAATATTCTGCATACCCATATTGCTATTTIGTIGTIGIAGTACTGGTTGTTGIGGATGTATCGGGTGTTTAGGAAGCTGTTGTCATTC 


Ics RGQFESYEPIEKKVHV HE * 
CATATGTAGTAGAGGCCAATTTGAAAGT TATGAACCTATTGAAAAAGTTCATGTTCACTGAAT TCAAAATGTTAAGTCTACTATTTTAATTACACCCTTGGCAACACAAGTGATATAAAG 


GTGGTGTCGTAATTCATACCAGTCAATTTTAGCATTAATAAAACACACTTICTATGGCTGGTAATACCGGTTATATATAATGTGTTTTTAATTTTTAAGAACTAAACT TATGAGTCATTAC 


3a MDIVKSIODTSVDAVLUODEFODCAY FAVTLKVEF KTGKQL 
AGGTCTTGTATGGACATTGTCAAATCTAT TGACACATCCGTAGACGCTGTACTTGACGAATT TGAT TGCGCATACTTTGCTGTAACTCT TAAAGTAGAGT TCAAGACTGGTAAGCAACTT 


vcecetrGéFGoODT$TLIEE A KD KAYA KLGLS I IEEVN S RT VV * 
3xMLNLVSLLULKKSIVIQLFDITVY K 
GTGTIGTATAGGTTTTGGTGATACACT TTTAGAGGCTAAGGACAAAGCATATGCTABACT TGGTCTCTCTATTATTGAAGAAGTCAATAGTCATACAGTTGTTTGATATTACTGTTTATAA 


F K A K F WY K L PF ET RORII©IK AT K PK ALS ATK OQOVK RDYRKTA 
GITTAAGGCCAAATTTTGGTACAAATTACCTTTTGAAACTAGACTTCGTATCATTAAACACACAAAACCTAAAGCATTAAGTGCTACAAAACAAG TAAAGAGAGATTATAGAAAAACTGC 


3b MIGGUL*FOLNTULSsS F V it VS N HVI VN NT AN V HH TQ * BD 
I LN S MR K * 
CATTCTAAATTCCATGAGAAAATGATTGGTGGACTITTTCTTAACACTCTGAGTTTTGTAATTGTTAGCAACCATGTCATTGTTAACAATACAGCAAATGTGCATCACACACAATAAGAC 


HVIV@QQsRHQFVSARTQN YY PEF STAVOLFEF VS FLA L Y RS TN F K 
CATGTTATAGTACAACAACATCAGTTTGTTAGTGCTAGAACACAAAATTACTACCCGGAGTTCAGCATTGCTGTACTCT TTGTATCCTTTCTAGCTTTGTACCGTAGTACAAACTTTAAG 


TCOVGILMFKIVSMTLUTGPMLIEAFGYYIODGIV TTI VLALR EP 
ACGTGTGTCGGTATCTTAATGTTTAAGATTGTATCAATGACACTTATAGGACCTATGCTTATAGCATTTGGTTACTACATTGATGGCATTGTTACAACAATTGTCTTAGCTTTAAGATTT 


IY v Ss Y F W Y VN NRFEFILY NTT TLM FV HGRAAP FM RS S$ HS S 
ATTTACGTATCATATTTCTGGTATGTTAATAATAGATTTGAATTCATTTTATACAATACGACGACACTCATGTTTGTACATGGCAGACCTGCACCGTTTATGAGAAGTTCTCACAGCTCT 


IYvtf?TuLuY GGIIN YM FVNODULTOLAHFV DPMLVS TATR GLA HA DLT 
ATTTATGTCACATTGTACGGTGGCATAAATTATATGT TTGTGAATGACCTCACGT TGCATTT TGTAGACCCTATGCTTGTAAGCATAGCAACACGTGGCTTAGCTCATGCTGATCTAACT 


VVRAVELULUNGODFIYVFSQEPVVGVYNAAF SQAVLN ETI ODE K 
GTTGTTAGAGCAGTTGAACTTCTCAATGGTGATTTTATTTATGTATTTTCACAGGAGCCCGTAGTCGGTGTTTACAATGCAGCCTTTTCTCAGGCGGTTCTABACGAAATTGACTTARAA 


BoB gE? SB Ds He SX De NRE SG; EeD, Gt Hi OR: 
4m TF PRALTVIODODONGMVISTIFWFLULIIIrILILULrF s 
GAAGAAGAAGAAGACCATATCTATGACGTTCCCTCGGGCATTGACTGTCATAGATGACAATGGAATGGTCATTAGTATCATTTTCIGGTTCCTGTTGATAATTATAT TGATATTIATTITC 


ioAL UN IK LCMV CCN LG RTVITIVv PAR HA Y DAY KN FM QT RA 


AATAGCATTGCTAAATATAATTAAGCTATGCATGGTATGTTGCAATT TAGGAAGAACAGT TAT TAT TGTTCCAGCTCGACATGCCTATGATGCCTATAAGAATT TTATGCAAATTAGAGC 


Y N P DEALLYV * 


M wk KT LPF LL A CAT AC Vy G ERY CAM TES S$ 
ATACAACCCTGATGAAGCACTCCTTGT TTGAACTABACAAAAT GAAGAAAATTTTGTTTTTACTAGCGTGTGCAATTGCATGCGTCTATGGAGAACGCTATTGTGCCATGACTGAAAGTT 


Ts CRNSTAGNCASCPFPETGODLIWHLANWNEF SWS VIL TIF IT 
CTACGTCATGTCGTAATAGCACGGCT GGCAACTGTGCTTCATGCT TCGAAACAGG TGA TCTTATTTGGCATCTTGCAAACT GGAACTTCAGCTGGTCTIGTAATATTGATCATTTTTATAA 


vLuL@QyY¥GRPoOF SWFveGiIiIkK™MLIM®M*RWR®LOEWP I VLAL TTF NAY LE Y 
CAGTGTTACAATATGGAAGACCTCAATITAGCTGGTTCGTGTGTGGCATTAAAATGCTTATTATGTGGCTGTTATGGCCCATTGTTTTAGCTCTTACGATITTTAATGCATACCTGGAAT 


RV SR YVM™MFGFSVAGATVTFI-LWIM Y FVRSTOQL YY RRTK S WW 
ACCGAGTTTCCAGATATGTAATGT TCGGCITTAGTGT TGCAGGTGCAACTGTTACATTTATACTTTGGATTATGTATTTTGTTAGATCCATTCAGTTATACAGAAGGACTAAGTCTTGGT 


S FNPETSATILECOVSALGRSY¥VLULUPLEGVPTGCVYTLTILCLESGNLE 
GGTCTTTCAACCCTGAAACTAGCGCAATTCTTTGCGTTAGT GCGT TAGGAAGAAGCTATGTGCTTCCTCT TGAAGGTGTGCCAACTGGTGTCACTCTAACATTGCTTTCAGGGAATTTGT 
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A EG F K I AGGM NI DN LP K ¥Y VM VAL PCY RT IV ¥Y TLV GK KOLOK A S 
GIGCTGAAGGGTTCAAAATTGCAGGTGGTATGAACATCGACAATTTACCAAAATA TGTAATGGTTGCATTACCTGTCAGAACCATAGTCTACACACT TGTTGGCAAGRAATTGAAAGCAA 


SAT GWAyYYVK S KAGODYSTODARTODNULUS EH EK LLAM V * 
GTAGTGCAACAGGATGGGCTTACTATGTAAAGTCTAAAGCTGGTGATTACTCAACAGATGCACGAACTGATAATTT GAGTGAGCATGAAAAATTATTACATATGGTATAACTAAACTTCT 


Nuh aAsQGQRVSW"8®#GODES TKRRGRSNSRGRKNNODIPLS FFNP IT 
ARATGGCCTCTCAGGGACAACGTGTCAGTTGGGGAGATGAA TCCACCAAGAGACGCGGTCGT TCTAATTCTCGTGGCCGGAAGAATAATGATATACCTCTTTCATTCTTCAACCCCATTA 


LEqQcGs K F ®R DLCPRODFVPKGIGNKODeQ I GY WN RQT RY RM V K 
CCCTCGAGCAAGGATCAAAGTTTTGGGACTTATGTCCGAGAGACTTTGTACCCAAAGGAATAGGTAATAAGGATCAACAAATTGGTTATTGGAACAGGCAAACCCGTTATCGCATGGTGA 


G RR K N LP EK WF F ¥ Y LG TGP HA ODA KF KQKLD GV V WV AR GOD S 
AGGGTCGACGTAAAAATCTTCCTGAAAAGTGGTTCT TCTACTATTTAGGAACTGGACCTCATGCTGATGCCAAATT TAAGCAAAAAT TAGATGGAGTTGTCTGGGTTGCTAGGGGAGATT 


MTK PT T OL GTRGTN NE S K AL K F DV K V PS EF HL EVN Q@ Gb RDN S 
CCATGACTAAGCCAACAACTCTTGGTACTCGTGGCACTAATAATGAA TCAAAGGCTT TGAAATTCGATGTCAAAGTACCATCAGAAT TTCACCTT GAAGTGAACCAAT TAAGGGACAATT 


R SR SQSRSQSRNRSOQSRGRQLS NN KK ODODNVEQAV LAA L K K 
CAAGGTCTAGGTCTCAATCTAGATCTCAGTCCAGAAATAGGTCTCAA TC TAGAGGAAGGCAACTATCCAAT AAT AAGAAGGATGACAATGTTGAACAAGCTGTTCTTGCTGCACTCAAAA, 


LGvoDTEKQQq Rk S$ R § K S$ K ERS S&S SK TROD TT P KN EN K HT WK RTA 
AGT TAGGTGTTGACACAGAAAAACAACAAAGATCTCGTTCCAAATCTAAGGAACGTAGCAGCC TCTAAGACAAGAGATACTACACCTAAGART GARAACAAACACACCT GGAAGAGAACTG 


G KGoDVTK FY GaARS §& SAN FGODSDLVANGNGA KH Y PQLAEC Vv 
CAGGTAAAGGTGATGTGACAAAATTTTATGGAGCTAGAAGTAGT TCAGCCAATTTTGGTGACA&CGATCT TGT TGCCAATGGGAACGGTGCCAAGCATTACCCACAACTGGCTGAATGTG 


P SVS SILFGSHWTAKEODGoDQtIEv«wT F TH K Y HL PK DD P KT G Q 
TTCCATCTGTATCTAGCATTCTGITTGGAAGCCATT GGACTGCTAAGGAAGATGGTGACCAGATTGAAGTCACATTCACACACAAATACCACTTGCCAAAGGATGAT CCTAAGACTGGAC 


FL@QeQI1N AY ARPS EVA K EQRQ RK ARS K SV ER V EQ EV V PDA L 
AATTCCTTCAGCAGATTAATGCA TACGCCCGTCCATCAGAGGTGGCT AAA GAACAGAGACAACGCAAAGCTCGTTCTAAATCTGTAGARAGGGTAGAGCAAGAGGTTGTACCTGATGCAT 


Ja MLV F LHAV FET Vib tun 
TEN Y TODVFODDTQVEIIDEVTN * 
TARCAGAAAA TTACACAGATGTGTTTGATGACACACAGGTTGAGATTATTGA TGAGGTAACGAACTABACGAATGCTCGTTTTCCTCCATCCTGTGITTATTACAGTITTAATCTTACTA, 


LIGRULQtULULERLULLENAHSLNLKTVNNVLGVTATGLKEVNCLUQL 
CTAATTGGTAGACTCCAATTATTAGAAAGATTATTACTTAATCACTCTCTTAATCTTAAAACTGTCAATAATGTTT TAGGTGTGACTCACACTGGCCTAABAGTAAATTGCTTACAGCTC 


LK PDCLODFNILRBRSLAETRLULKVVLRVIFLEVLLGF Cc Y REL 


TT GAAACCAGACTGTCTTGATTTTAACATCTTACATAGGAGT TTGGCAGAAACCAGAT TACT AAAACTAGTACTTCGAGTAATCTTTCTAGTICTACTAGGGTTSTGCTGCTATAGATTG 


LvVo?T LF * Jb 
MK Fw IbubvVvyLbLCouUus FYVYNGY¥YGIKRNV QE ROL K DS HE R 


TTAGTCACATTATTTTAACATCATGAAGTTTGTGATTCTTGTGTTGTGTCTTTCTTTTGTGAATGGATATGGAATCAARAGAAATGTGCAAGAACATGACCTAAAAGATTCCCATGAGCA 


PT MT W EL LEK FVGNTLY ITT PAQvL_ALPLEGaAQT YY € OF TE G F 
TCCAACCATGACATGGGAACTATTAGAAAAATTTGT TGGAAACACCCTT TACATCACAACACCTCAAGTGCTTGCACTACCATTAGGTGCACAAATATATTGTGATGAAATTGAAGGATT 


ecs WP GY KN Y AH DH TO F H FNP S NP F ¥ S FV DT F ¥ VS LGD SA 


TCAATGTTCTTGGCCAGGTTATAAAAATTATGCCCATGATCATACTGATT FTCATTTCAATCCCTCTAATCCATTCTATTCCTTTGTGGATACTTTTTATGTTTCCTTAGGTGATAGTGC 
pK IY LRvVdIs AT S R EK MUNI GCR TS FS VNU FP IG TQ EY HD K D 


GGATARAATTTATCTTAGAGTGATTAGTGCAACATCTAGAGAGAAAATGTTGAATATTGGTTGTCACACATCTTTCTCAGTAAACCTTCCAATTGGAACTCAGATTTACCATGACAAGGA 


MK L LV EG RHLEC A HR IY FV K ¥ C P ¥ HT RG Y € F DOK LK VY DL 
CATGAARACTTCTTGTCGAAGGAAGACATCTTGAGTGTGCTCACAGAATT TACTTT GIGAAGTATTGTCCATACCATACACATGGGTATTGCT TT GATGACAAGCTAAAGGTCTATGATCT 
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K R VK S R K DF EK ISQ¥YQKS EL * 
GAAGCGTGTCAAAAGCAGGAAGGATTTTGAGAAAATCAGCCAATA TCAGAABAGT GAGTTGTAAGGCCACCCGATGTT TAAAATGGTTTTTCCGAGGAATTACTGGTCATCGCGCTGTCT 9360 


ACTCTTGTACAGAATGGTAAGCCAAGTGTCAATAGGAGGTACAAGCAACCTATTGCATATTAGGAAGTTTAGATTTGAT TTGGCAATGCTAGATTTAGTAATTTAGAGAAGT TTAAAGAG 9480 
TCCGTATGACGAGCCAACAATGGAAGAGCTAACGTCTGGATCTAGTGATTGITTAAAATGTAAAATTGTTTGAAAATTTTCCTTTTGATAGTGATTCACCAAARAAAAAARAABAAARAAA 9600 
AAAAABAAAAAAAAAAAARA 9620 
Fig. 2. Sequence of the extreme 3’ 9624 nucleotides of the CCV genome and the deduced amino acid sequences encoded by the ORFs. 
The consensus intergenic sequences are underlined. The octanucleotide sequence conserved in the 3’ non-coding region of all 


coronaviruses is shown in bold. The predicted ORFs are translated into the single-letter amino acid code. The putative signal peptides 
for S and M proteins are underlined. 


CCV 


—_— rr _ Ans mRNA! 
Cp L2ZZZ_ A, mRNA 2 

Cap BL i, GRAS 

Cap {--——___—__—_ *»_ mRNA4 

Cap BRN *n RAS 


mRNA 6 


mRNA 7 


Fig. 3. Gene and subgenomic message organization predicted from the sequence data and Northern blot analyses cited in Results. 
Genes are designated according to the recommendations of the coronavirus study group (Cavanagh et al., 1990a, 5). ORFs are 
represented by boxes. The vertical line in ORF 3b represents a stop codon and the black boxes represent leader sequences. Numbers 
represent ORFs encoded by that message. 


Table 1. Pairwise sequence homology between CCV and possibility that such an RNA may be synthesized at a low 
FIPV, TGEV, PRCV and MHV ORFs level must be considered because a CTAAAC signal was 
SS SSS ee observed. Similarly, in the case of FIPV strain 79-1146 
wee NeuSi Pairwise identity (74) no RNA has been detected between RNA 3 and the 
ORF M, x 10-3 Amina: acids FIPV TGEV PRCV MHV membrane polypeptide RNA (de Groot et al., 1987) but 
<<. 2h ae OT a ae a the possibility of an equivalent of the TGEV RNA 4 has 
1b NKt 168 95-2 964 = 52-7 ‘ i 
28) 160 1452 ie 2 50 ae a been alluded to (de Groot, 1989). Thus, the numbering 
34 86 7 ~ 83:5 ABB conventions employed do not deal adequately with the 
3b 28-48 251 NK 92:7 926 variations in expression strategy observed in this region 
4 (SM) 3 82 NK 88-4 88-4 thi : ‘ H 

5 (M) 99-5 262 Ru) 8A 3 of genome within this group of closely related viruses. 
6 (N) 43-4 401 76-4 89-6 86-9 27-3 

Ta 11-5 101 78-4 68-5 68 

ae 29-4 212 as ORFs encoded by mRNAs I and 2 


; 
ORE De te notancluded: (ecorneult) ORF 1 is incomplete, has no AUG start codon, encodes 
t NK, Not known. 


t Incomplete. 168 amino acids and terminates in a UGA stop codon 
§ Disregarding terminator, otherwise M, = 4000. at position 510 (Fig. 2). A comparison of this ORF 


kb 


Fig. 4. Northern blot analysis of IBV Beaudette mRNA size markers 
(lane 1) and CCV strain Insave-1 (lane 2) mRNAs. Unlabelled 
intracellular RNAs were separated by formaldehyde gel electro- 
phoresis. The RNAs were transferred to a membrane filter and 
hybridized with radiolabelled inserts IBV-N and pBHS, respectively. 
IBV-N, a PCR product of the IBV Beaudette N gene was kindly 
supplied by Dr David Cavanagh. No CTAAAC matif which could give 
rise to a messenger species was found between the CTAAAC motifs 
associated with the messenger species 6 and 7. 


with TGEV strain FS772/70 shows 99-2% similarity to 
1b and 47 and 52:7% identity to genes 1b of avian 
infectious bronchitis virus (IBV) and MHV, respectively 
(Britton & Page, 1990; Boursnell et a/., 1987; Bredenbeek 
et al., 1990). Thus, this ORF represents the 3’ end of 
the putative polymerase-encoding region of genome 
mRNA 1. 

ORF 2 located immediately downstream of the 
polymerase gene would be translated from the 9-1 kb 
subgenomic message 2. This ORF is 4356 nucleotides 
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long representing 1452 amino acids with a calculated M, 
of 160K. Comparison of this ORF with sequences held in 
the EMBL database reveals remarkably high identity to 
the FIPV spike glycoprotein-encoding sequences 
(91-1°%) and, to a lesser degree, the porcine virus S genes 
(Table 1), indicating that this is the CCV S gene. In some 
strains of MHV, the haemagglutinin—esterase glycopro- 
tein gene (HE) is found downstream of the polymerase 
gene (Luytjes et al., 1988) but it is clear that CCV, like 
TGEV and IBV, encodes only the polymerase gene 
upstream of the S gene (Britton & Page, 1990; Boursnell 
et al., 1987). 

The CCV S protein shows features characteristic of a 
type I membrane protein, i.e. a putative signal sequence 
(Von Heijne, 1986; positions 506 to 563; Fig. 2) and 
transmembrane domain (positions 4682 to 4742; Fig. 2). 
There are also 30 potential N-glycosylation sites which 
probably account for the increased size of the S protein 
found in the virion (Garwes & Reynolds, 1981). 


ORFS encoded by mRNAs 3 and 4 


There are four ORFs distal to the S gene coding sequence 
which are likely to be encoded by messages 3 and 4 (Fig. 
3). Three of these have close similarity to their porcine 
virus counterparts and have been named 3a (8-6K), 3b 
(28-4K) and 4 (9:3K) (Table 1). The fourth ORF, which 
to date has not been detected in this group of viruses, 
could potentially encode a 71 amino acid protein with a 
predicted M, of 10K and overlaps ORFs 3a and 3b (Fig. 
2 and 3). This ORF has been designated 3x. The CCV 3b 
ORF was expected to encode a 28K protein like its 
TGEV counterpart (Jacobs et al., 1986). However, this 
strain of CCV has acquired a termination codon, UAA 
(at position 5515; Fig. 2), which would result in a 
truncated polypeptide of only 33 amino acids. Direct 
sequencing of the viral genomic and mRNAs has 
confirmed the authenticity of this stop codon (data not 
shown). The CCV 4 ORF encodes a small membrane 
protein that is related to the 3c product of IBV (Fig. 5). 

Message 4, as predicted from our sequence data, was 
detected in Northern blots (see Fig. 4). This message 
could only express ORF 4, as the proposed signal for 
transcription, CTAAAC, is found 43 nucleotides up- 
stream of the predicted ORF 4 start codon. This 
arrangement is found in a number of strains of TGEV. 


ORFs encoded by mRNAs 5, 6 and 7 


Messenger RNA species 5 and 6 encode ORFs which 
resemble the coding sequences for the other coronavirus 
structural proteins, M and N, respectively (Table 1). 
Translation of poly(A)-selected CCV intracellular RNA 
in the rabbit reticulocyte lysate system produced pro- 
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IBV-Beaudette 
CCV-Insavec-1 
TGEV-Miller 
MHV-JHM 


BCV-Mebus 
* 


IBV-Beaudette 
ccCV-Insavce-1 
TGEV-Miller 
MHV-JHM 


BCV-Mebus 
* 


MMNLLNKSLEENGSFLTALYIIVGFLALYLLGRALOAFVQAADACCLEWYTWVVI 
MTFPRALTVIDDNGMVISIIFWFLLIIILILFSIALLNIIKLCMVCCNLGRIVIIV 
MTFPRALTVIDDNGMVISIIFWFLLIIILILLSIALLNIIKLCMVCCNLGRTVIIV 

MFNLFLTDTVWYVGOI IFIVAVCLMVIIIVVAFPLASIKRCIQLCGLCNTLLLS 

MFMADAYFADTVWYVGQIIFIVAICLLVIIVVVAFLATFKLCIQLCGMCNTLGLS 


* * 


Hydrophobic region 


PGAKGTAFVYKYTYGRKLNNPELEAVIVNEFPKNGWNNKNPANFQDAQRDKLYS 
PARHAYDAYKNFMQIRAYNPDEALLV 

PVQHAYDAYKNFMRIKAYNPDGALLV 
PSIYLYNRSKQLYKYYNEEVRPPPLEVDDNIIOQTL 
PSIYVFNRGRGF YEF YNDVKPPVLDVDDV 


Fig. 5. Alignment of the putative small membrane protein amino acid sequences from five different strains of coronaviruses. The 
hydrophobic core is shown in bold. Asterisks represent conserved features. 


ducts of the sizes expected for M and N when analysed 
by SDS-PAGE (data not shown). ORFs 7a and 7b 
are likely to be encoded on a single RNA species (mRNA 
7) since smaller messages were not seen on Northern 
blots, nor is another message predicted from the 
sequence data. Furthermore, an equivalent RNA in 
FIPV is thought to be bicistronic (de Groot et al., 1988) 
and the levels of identity between the 7a and 7b ORFs of 
CCV and the 6a and 6b ORFs of FIPV are 78:-4% and 
57% respectively. Alignment of this region of CCV with 
the related regions of TGEV and PRCV reveals that the 
7a ORF of the porcine coronaviruses has undergone a 
deletion of 69 nucleotides and furthermore they have no 
counterpart to ORF 7b. Nevertheless, the CCV struc- 
tural protein ORFs, with the exception of S, have higher 
identities to TGEV than to FIPV ORFs. 


Discussion 


In this study approximately 9-6 kb of the 3’ end of the 
CCV strain Insavc-1 genome was cloned and sequenced. 
This region is likely to include all of the viral genes 
excluding the polymerase gene for which only the 3’- 
terminal 168 amino acids have been determined. 
Therefore, a substantial part of the virus’ genetic 
information was available for comparison with other 
antigenically related coronaviruses, namely TGEV, 
PRCV and FIPV. The deduced sequence and genetic 
organization of CCV are shown in Fig. 2 and 3, 
respectively. 

From antigenic data and cross-infectivity studies, the 
viruses within this group have been termed ‘host range 
mutants’ (Horzinek et al., 1982). This close evolutionary 
relationship is emphasized by our analyses of the CCV 


sequence data. The CCV spike protein is closely related 
to the other spikes and has the features typical of 
coronavirus peplomer glycoproteins. Any variation in 
the sequence of this protein within the group presumably 
reflects changes in cell tropism, drift as a result of 
polymerase errors and selection by the host’s immune 
system. Similarly, interspecies comparison of the other 
structural proteins, M and N, revealed very high levels of 
identity (Table 1). Alignment of the M gene product 
amino acid sequences revealed that any variation was 
primarily found on what would be the exposed amino 
terminus of the protein (amino acids 22 to 44; Fig. 2), i.e. 
between the putative signal sequence (Von Heijne, 1986) 
and the first transmembrane domain. However, the 
single potential N-glycosylation site and the three 
cysteine residues are conserved. These cysteine residues 
are probably important in forming interchain disulphide 
bridges, as M of HCV-229E has been shown to form 
oligomers under non-reducing conditions (Arpin & 
Talbot;. 1990). The variation in this region is again 
probably a result of selection pressure from the host’s 
immune system. Interestingly, alignment of the N gene 
amino acid sequences indicated that FIPV N has 
diverged to a greater extent than those of both CCV and 
TGEV (Fig. 6). This is unusual as N proteins are 
normally highly conserved; alignment of N gene amino 
acid sequences from five isolates of MHV showed at least 
90% identity (Masters et al., 1990). Nevertheless, 
variation was mainly clustered in two regions of the N 
molecule, between positions 204 and 210, and 352 and 
359 (Fig. 6). It has been proposed that these two loci 
represent spacers, which have little sequence specificity 
but connect conserved domains of the molecule involved 
in interaction with the RNA genome (Masters et al., 
1990). 


PRCV 
TGEV 
CCV 

FIPV 


PRCV 
TGEV 
CCV 

FIPV 


PRCV 
TGEV 
CCV 

FIPV 


PRCV 
TGEV 
CCV 

FIPV 


PRCV 
TGEV 
CCV 

FIPV 


PRCV 
TGEV 
CCV 

FIPV 


Sequence of 3’ end of canine coronavirus RNA 


60 
MANQGORVSWGDESTKIRGRSNSRGRKINNIPLSFFNPITLOQGAKFWNSCPRDFVPKGI 
MANQGORVSWGDESTKTRGRSNSRGRKNNNIPLSFFNPITLOQGSKFWNLCPRDFVPKGI 
MASQGORVSWGDESTKRRGRSNSRGRKNNDIPLSFFNPITLEQGSKFWDLCPRDFVPKGI 
MATQGQRVNWGDEPSKRRGRSNSRGRKNNDI PLSFYNPITLEQGSKFWNLCPRDLVPKGI 
KARR RK RK Ok KR RK Ok OK Ok RK Ok KR kK KKK 

120 
GNRDQQIGYWNROTRYRMVKGORKELPERWFFY YLGTGP HADAKF KDKLDGVVWVAKDGA 
GNRDOQIGYWNROTRY RMVKGORKELP ERWE FY Y LGTGP HADAKF KDKLDGVVWVAKDGA 
GNKDQQIGYWNRQTRYRMVKGRRKNLPEKWEF Y YLGTGP HADAKF KQKLDGVVWVARGDS 
GNKDQQIGYWNRQIRYRIVKGORKELAERWFF YF LGTGPHADAKF KDKI DGVFWVARDGA 
OK I IIR ORO I RO OO I I FO 
: 180 
MNKPTTLGSRGANNESKALKFDGKVPGEF QLEVNQSRDNSRSRSQSRSRSRNRSQSRGROQ 
MNKPTTLGSRGANNESKALKFDGKVPGEFQLEVNOQSRDNSRSRSQSRSRSRNRSQSRGRQ 
MTKPTTLGTRGTNNESKALKFDVKVPSEF HLEVNOLRDNSRSRSOSRSQSRNRSQSRGRQ 
MNKPTTLGTRGTNNESKPLRFDGKIPPQFQLEVNRSRNNSRSGSQSRSVSRNRSQSRGRH 
OR I I Rk kk IR ke RI 

: 240 
SRSKSKERSNSKTRDTTPKNENKHTWKRT 
SRSKSKERSNSKTRDTTPKNENKHTWKRT 


QSNNKKDDSVEQAVLAALKKLGVY TEKQQOQR 
QFNNKKDDSVEQAVLAALKKLGVDTEKQQQR 
LSNNKKDDNVEQAVLAALKKLG 
HSNNQ-NNNVEDTIVAVLEKLGV-TDKQ-—-RSRSKPRERSDSKPRDTTPKNANKHTWKKT 


wk TKR LLL RL RL RAK KK 


300 
AGKGDVTRFYGARSSSANFGDSDLVANGSSAKHYPQLAECVPSVSSILFGSYWTSKEDGD 
AGKGDVTRFYGARSSSANF GDTDLVANGSSAKHYPQLAECVPSVSSILFGSYWTSKEDGD 
AGKGDVTKF YGARSS SANF GDSDLVANGNGAKHYPQLAECVPSVSSILFGSHWTAKEDGD 
AGKGDVTTFYGARSSSANFGDSDLVANGNAAKCYPQIAECVPSVSSI IFGSQWSAEEAGD 


KAEKKKKK KKKKKKEK KEK KKK KAR K KKK KK KKK RK KKKKERKEKK EK kK LR 


QIEVTFTHKYHLPKDHPKTEQFLQOQINAYACPSEVAKEQRKRKSRSKSAE 
QIEVTFTHKYHLPKDDPKTGOF LOO INAYARPSEVAKEQRKRKSRSKSAERSEQDVVPDA 
QIEVTETHKYHLPKDDPKTGOF LQQINAYARPSEVAKEQRQRKARSKSVERVEQEVVPDA 
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PRCV NP 
TGEV NP 
CCV NP 
FIPV NP 


LIENYTDVFDDTQVEMIDEVTN 
LIENY TDVFDDTQVENIDEVTN 
LTENY TDVFDDTQVEIIDEVTIN 
LVEAY TDVFDDTQVEMIDEVTN 


KK KKK KKKEAKKKK KKKKKK 


QVKVTLTHTYY LPKDDAKTSQFLEQIDAYKRPSEVAKDQRORRSRSKSADKKPEEL-SVJ 


KL RK KK KR KKK KK KKK KK KK KREKKK RK KK KKK 


Fig. 6. Alignment of the nucleocapsid protein amino acid sequences from PRCV strain 86/137004 (Britton et a/., 1991), TGEV strain 
FS772/70 (Britton & Page, 1990) CCV strain Insave-1 (this paper) and FIPV strain 79-1146 (Vennema et a/., 1991) using the CLUSTAL 
program (Higgins & Sharp, 1989). The asterisks below the sequences show identical amino acids in all four viruses and a dot is used if 
there has been a conservative substitution. The minus signs represent deletions. The boxed areas represent putative spacer regions 


(Masters et al., 1990). 


The ORFs that lie between the S and M genes have, 
like the other ORFs so far analysed, a high degree of 
identity to their porcine virus counterparts (Table 1) 
and presumably perform similar functions. A previously 
undetected ORF, 3x, was identified which could poten- 
tially encode a 10K polypeptide. However, codon usage 
and base preference programs of Staden (1982) suggest 
that this ORF does not encode a functional viral protein. 
Furthermore, the proximal AUG is in a poor context for 
translation initiation (Kozak, 1986) and the only other 
AUG is found at the very 3’ end of the coding sequence. 
Therefore, it is very unlikely that this ORF is expressed 
in this strain of CCV and it probably represents an 
evolutionarily redundant sequence which is no longer 
required by the virus. Analysis of TGEV genomic 
sequence in this region revealed a counterpart for this 
canine virus pseudogene; 92 nucleotides have, however, 


been deleted. This deletion also results in a frameshift in 
the sequence which explains why this ORF has not 
hitherto been noticed (Fig. 7). In addition to the likely 
non-functionality of ORF 3x, it is also unlikely that ORF 
3b is expressed in this strain of CCV. Although a 
transcription signal, CTAAAC, is present upstream of 
ORF 3b (Fig. 2, position 5213), we were unable to detect 
an mRNA of the predicted size on Northern blots. Even 
if low-level transcription occurs from this site, it is 
unlikely that ORF 3b is expressed as there is a 
termination codon (UAA) some 93 nucleotides down- 
stream of the first AUG and subsequent AUG codons 
are in poor contexts for ribosome binding (Kozak, 1986). 
In fact, in vitro transcription and translation of this ORF 
did not yield any discernible products by SDS-PAGE 
analysis (data not shown). Alignment of ORF 4 amino 
acid sequences disclosed features in common with the 
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A t 
ccv Insave-1l MLNLVSLLLKKSIVIQLEDI 
TGEV FS772/70 MLSLVSPLFKK-~-----~-- 
PRCV 86/137004 --------9---7------- 
ccv Insave-1 TVYKFKAKFWYKLPFETRLR 
TGEV FS772/70 --------<---73-- 7-77 
PRCV 86/137004 = ------------~------- 

Q 
cCv Insavc-1 IIKHTKPKALSATKQVKRDY 
TGEV FS772/70 --KHTKSKALSVTKQLKRDY 
PRCV 86/137004 RSKHTKSKALSVTKQLKRDY 


RKTAILNSMRK O-——————?3- 
RKTVILNEMRK 0-3 
RKTRTKLCENDWWTES 


CCV Insavc-1 
TGEV FS772/70 
PRCV 86/137004 


Fig. 7. Alignment of the TGEV and PRCV amino acid sequences with 
the CCV pseudogene 3x, which overlaps ORFs 3a and 3b. Circles with 
arrows represent the start of ORF 3b. The end of the 3a ORFs in CCV, 
TGEV and PRCV are indicated by the symbols +, A and Q, 
respectively. Deletions are represented by minus signs. The intervening 
sequence between 3a and 3b ranges from 60 to over 200 nucleotides 
between these strains, indicating that deletions may occur in this region 
at a higher frequency than in the surrounding sequence. 


IBV 3c protein (Fig. 5). This protein is found in the viral 
envelope (Smith et al., 1990) and it has been suggested 
that it may be translated from IBV mRNA 3 by a cap- 
independent mechanism (Liu, 1991). However, CCV 
strain Insave-1 expresses a message species, MRNA 4, 
appropriate for conventional cap-dependent expression 
of ORF 4. This RNA is difficult to detect since it is very 
similar in size to the abundant M message and is present 
in low abundance, possibly as the result of a suboptimal 
RNA transcriptional leader binding site, CTAAAC, 
which is found 43 bp upstream of the AUG start codon of 
ORF 4. 

The degree of variability in the lengths of the non- 
coding sequences that lie upstream and downstream of 
ORF 3a in members of this antigenic group is striking. 
The lengths of these sequences range from 40 bp to over 
200 bp (Fig. 7). Alignment of the ORF 3a amino acid 
sequences reveals that, in addition, variation is found at 
the ends of these coding sequences. Perhaps the non- 
coding regions proximal and distal to 3a are ‘hot spot 
regions’ where recombination, insertions or usually 
deletions can occur at a higher frequency relative to the 
surrounding sequences. The dynamism of the genome is 
well documented in coronaviruses (Keck et al., 1988; 
Kusters et a/., 1989) and may be related to the propensity 
of the replicase complex to fall off its template and then 
to reinitiate RNA replication on the same or a different 
template. It would appear that there are three regions 
where deletions can occur at a higher frequency: within 
S, between S and M, and downstream of N. The 
polymorphism of § found in MHV strains with differing 


passage histories is mainly due to deletions in that gene 
which can lead to deletions of up to 159 amino acids. 
Consequently, this has an effect on pathogenicity, as 
deletions in the MHV-4 S coding sequence apparently 
result in a loss of ability to induce fatal encephalitis and 
the acquisition of a non-fatal demyelinating disease in 
mice (Parker et al., 1989). Polymorphism has also been 
observed in the S gene and in the region between the 
S and M genes for different strains of TGEV and the 
respiratory tract mutant, PRCV (Wesley et al., 1990; 
Rasschaert et al., 1990). In fact, an IBV strain 
(Port/322/85) has been reported which appears to have 
arisen as a result of recombination between the M and S 
genes from two other strains of IBV (Cavanagh et al., 
1990). The third ‘hot spot region’ is found downstream 
of the N gene. The porcine coronaviruses have a 69 
nucleotide deletion in ORF 7a and ORF 7b is not present 
(de Groot et al., 1988). This phenomenon is not unique to 
the coronaviruses from this antigenic group. Deletions of 
up to 170 nucleotides are found downstream of the N 
gene in some strains of IBV (Collisson et al., 1990). 

CCV ORF 7b has 57% identity to FIPV 6b. This ORF 
is the least conserved between the two viruses. Whether 
the protein produced from this ORF plays an important 
role in the immune-mediated disease seen in felines 
remains to be seen as all the viruses from this antigenic 
group can infect cats but only FIPV will produce this 
disease. 

In conclusion, sequencing and subsequent analyses 
stress the very close relationship CCV has to the other 
viruses within its antigenic group. We must, however, be 
careful when generalizing about the CCV sequence data 
from this limited information. Coronavirus genomes 
are dynamic, subject to recombination, insertion and 
deletion, and as a consequence strains may show sig- 
nificant genetic differences. Clearly, there is a need to 
clone and sequence other strains in order to build a 
consensus picture of the CCV genome. 
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